Causal Inference

Overview

Causal inference determines cause-and-effect relationships and estimates treatment effects, going beyond correlation to understand what causes what.

When to Use

Evaluating the impact of policy interventions or business decisions

Estimating treatment effects when randomized experiments aren't feasible

Controlling for confounding variables in observational data

Determining if a marketing campaign or product change caused an outcome

Analyzing heterogeneous treatment effects across different user segments

Making causal claims from non-experimental data using propensity scores or instrumental variables

Key Concepts

Treatment

Intervention or exposure

Outcome

Result or consequence

Confounding

Variables affecting both treatment and outcome

Causal Graph

Visual representation of relationships

Treatment Effect

Impact of intervention

Selection Bias

Non-random treatment assignment

Causal Methods

Randomized Controlled Trials (RCT)

Gold standard

Propensity Score Matching

Balance treatment/control

Difference-in-Differences

Before/after comparison

Instrumental Variables

Handle endogeneity
Causal Forests: Heterogeneous treatment effects Implementation with Python import pandas as pd import numpy as np import matplotlib . pyplot as plt import seaborn as sns from sklearn . linear_model import LinearRegression , LogisticRegression from sklearn . preprocessing import StandardScaler from scipy import stats

Generate observational data with confounding

np . random . seed ( 42 ) n = 1000

Confounder: Age (affects both treatment and outcome)

age

np . random . uniform ( 25 , 75 , n )

Treatment: Training program (more likely for younger people)

treatment_prob

0.3 + 0.3 * ( 75 - age ) / 50

Inverse relationship with age

treatment

( np . random . uniform ( 0 , 1 , n ) < treatment_prob ) . astype ( int )

Outcome: Salary (affected by both treatment and age)

True causal effect of treatment: +$5000

salary

40000 + 500 * age + 5000 * treatment + np . random . normal ( 0 , 10000 , n ) df = pd . DataFrame ( { 'age' : age , 'treatment' : treatment , 'salary' : salary , } ) print ( "Observational Data Summary:" ) print ( df . describe ( ) ) print ( f"\nTreatment Rate: { df [ 'treatment' ] . mean ( ) : .1% } " ) print ( f"Average Salary (Control): $ { df [ df [ 'treatment' ] == 0 ] [ 'salary' ] . mean ( ) : .0f } " ) print ( f"Average Salary (Treatment): $ { df [ df [ 'treatment' ] == 1 ] [ 'salary' ] . mean ( ) : .0f } " )

1. Naive Comparison (BIASED - ignores confounding)

naive_effect

df [ df [ 'treatment' ] == 1 ] [ 'salary' ] . mean ( ) - df [ df [ 'treatment' ] == 0 ] [ 'salary' ] . mean ( ) print ( f"\n1. Naive Comparison: $ { naive_effect : .0f } (BIASED)" )

2. Regression Adjustment (Covariate Adjustment)

X

df [ [ 'treatment' , 'age' ] ] y = df [ 'salary' ] model = LinearRegression ( ) model . fit ( X , y ) regression_effect = model . coef_ [ 0 ] print ( f"\n2. Regression Adjustment: $ { regression_effect : .0f } " )

3. Propensity Score Matching

Estimate probability of treatment given covariates

ps_model

LogisticRegression ( ) ps_model . fit ( df [ [ 'age' ] ] , df [ 'treatment' ] ) df [ 'propensity_score' ] = ps_model . predict_proba ( df [ [ 'age' ] ] ) [ : , 1 ] print ( f"\n3. Propensity Score Matching:" ) print ( f"PS range: [ { df [ 'propensity_score' ] . min ( ) : .3f } , { df [ 'propensity_score' ] . max ( ) : .3f } ]" )

Matching: find control for each treated unit

matched_pairs

[ ] treated_units = df [ df [ 'treatment' ] == 1 ] . index for treated_idx in treated_units : treated_ps = df . loc [ treated_idx , 'propensity_score' ] treated_age = df . loc [ treated_idx , 'age' ]

Find closest control unit

control_units

df [ ( df [ 'treatment' ] == 0 ) & ( df [ 'propensity_score' ]

= treated_ps - 0.1 ) & ( df [ 'propensity_score' ] <= treated_ps + 0.1 ) ] . index if len ( control_units )

0 : closest_control = min ( control_units , key = lambda x : abs ( df . loc [ x , 'propensity_score' ] - treated_ps ) ) matched_pairs . append ( { 'treated_idx' : treated_idx , 'control_idx' : closest_control , 'treated_salary' : df . loc [ treated_idx , 'salary' ] , 'control_salary' : df . loc [ closest_control , 'salary' ] , } ) matched_df = pd . DataFrame ( matched_pairs ) psm_effect = ( matched_df [ 'treated_salary' ] - matched_df [ 'control_salary' ] ) . mean ( ) print ( f"PSM Effect: $ { psm_effect : .0f } " ) print ( f"Matched pairs: { len ( matched_df ) } " )

4. Stratification by Propensity Score

df [ 'ps_stratum' ] = pd . qcut ( df [ 'propensity_score' ] , q = 5 , labels = False , duplicates = 'drop' ) stratified_effects = [ ] for stratum in df [ 'ps_stratum' ] . unique ( ) : stratum_data = df [ df [ 'ps_stratum' ] == stratum ] if ( stratum_data [ 'treatment' ] == 0 ) . sum ( )

0 and ( stratum_data [ 'treatment' ] == 1 ) . sum ( )

0 : treated_mean = stratum_data [ stratum_data [ 'treatment' ] == 1 ] [ 'salary' ] . mean ( ) control_mean = stratum_data [ stratum_data [ 'treatment' ] == 0 ] [ 'salary' ] . mean ( ) effect = treated_mean - control_mean stratified_effects . append ( effect ) stratified_effect = np . mean ( stratified_effects ) print ( f"\n4. Stratification by PS: $ { stratified_effect : .0f } " )

5. Visualization

fig , axes = plt . subplots ( 2 , 2 , figsize = ( 14 , 10 ) )

Treatment distribution by age

ax

axes [ 0 , 0 ] treated = df [ df [ 'treatment' ] == 1 ] control = df [ df [ 'treatment' ] == 0 ] ax . hist ( control [ 'age' ] , bins = 20 , alpha = 0.6 , label = 'Control' , color = 'blue' ) ax . hist ( treated [ 'age' ] , bins = 20 , alpha = 0.6 , label = 'Treated' , color = 'red' ) ax . set_xlabel ( 'Age' ) ax . set_ylabel ( 'Frequency' ) ax . set_title ( 'Age Distribution by Treatment' ) ax . legend ( ) ax . grid ( True , alpha = 0.3 , axis = 'y' )

Salary vs Age (colored by treatment)

ax

axes [ 0 , 1 ] ax . scatter ( control [ 'age' ] , control [ 'salary' ] , alpha = 0.5 , label = 'Control' , s = 30 ) ax . scatter ( treated [ 'age' ] , treated [ 'salary' ] , alpha = 0.5 , label = 'Treated' , s = 30 , color = 'red' ) ax . set_xlabel ( 'Age' ) ax . set_ylabel ( 'Salary' ) ax . set_title ( 'Salary vs Age by Treatment' ) ax . legend ( ) ax . grid ( True , alpha = 0.3 )

Propensity Score Distribution

ax

axes [ 1 , 0 ] ax . hist ( df [ df [ 'treatment' ] == 0 ] [ 'propensity_score' ] , bins = 20 , alpha = 0.6 , label = 'Control' , color = 'blue' ) ax . hist ( df [ df [ 'treatment' ] == 1 ] [ 'propensity_score' ] , bins = 20 , alpha = 0.6 , label = 'Treated' , color = 'red' ) ax . set_xlabel ( 'Propensity Score' ) ax . set_ylabel ( 'Frequency' ) ax . set_title ( 'Propensity Score Distribution' ) ax . legend ( ) ax . grid ( True , alpha = 0.3 , axis = 'y' )

Treatment Effect Comparison

ax

axes [ 1 , 1 ] methods = [ 'Naive' , 'Regression' , 'PSM' , 'Stratified' ] effects = [ naive_effect , regression_effect , psm_effect , stratified_effect ] true_effect = 5000 ax . bar ( methods , effects , color = [ 'red' , 'orange' , 'yellow' , 'lightgreen' ] , alpha = 0.7 , edgecolor = 'black' ) ax . axhline ( y = true_effect , color = 'green' , linestyle = '--' , linewidth = 2 , label = f'True Effect ($ { true_effect : .0f } )' ) ax . set_ylabel ( 'Treatment Effect ($)' ) ax . set_title ( 'Treatment Effect Estimates by Method' ) ax . legend ( ) ax . grid ( True , alpha = 0.3 , axis = 'y' ) for i , effect in enumerate ( effects ) : ax . text ( i , effect + 200 , f'$ { effect : .0f } ' , ha = 'center' , va = 'bottom' ) plt . tight_layout ( ) plt . show ( )

6. Doubly Robust Estimation

from sklearn . ensemble import RandomForestRegressor

Propensity score model

ps_model_dr

LogisticRegression ( ) . fit ( df [ [ 'age' ] ] , df [ 'treatment' ] ) ps_scores = ps_model_dr . predict_proba ( df [ [ 'age' ] ] ) [ : , 1 ]

Outcome model

outcome_model

RandomForestRegressor ( n_estimators = 50 , random_state = 42 ) outcome_model . fit ( df [ [ 'treatment' , 'age' ] ] , df [ 'salary' ] )

Doubly robust estimator

treated_mask

df [ 'treatment' ] == 1 control_mask = df [ 'treatment' ] == 0

Adjust for propensity score

treated_adjusted

( treated_mask . astype ( int ) * df [ 'salary' ] ) / ( ps_scores + 0.01 ) control_adjusted = ( control_mask . astype ( int ) * df [ 'salary' ] ) / ( 1 - ps_scores + 0.01 )

Outcome predictions

pred_treated

outcome_model . predict ( df [ [ 'treatment' , 'age' ] ] . replace ( { 'treatment' : 0 , 1 : 1 } ) ) pred_control = outcome_model . predict ( df [ [ 'treatment' , 'age' ] ] . replace ( { 'treatment' : 1 , 0 : 0 } ) ) dr_effect = treated_adjusted . sum ( ) / treated_mask . sum ( ) - control_adjusted . sum ( ) / control_mask . sum ( ) print ( f"\n6. Doubly Robust Estimation: $ { dr_effect : .0f } " )

7. Heterogeneous Treatment Effects

print
(
f"\n7. Heterogeneous Treatment Effects (by Age Quartile):"
)
for
age_q
in
pd
.
qcut
(
df
[
'age'
]
,
q
=
4
,
duplicates
=
'drop'
)
.
unique
(
)
:
mask
=
(
df
[
'age'
]
>=
age_q
.
left
)
&
(
df
[
'age'
]
<
age_q
.
right
)
stratum_data
=
df
[
mask
]
if
(
stratum_data
[
'treatment'
]
==
0
)
.
sum
(
)
>
0
and
(
stratum_data
[
'treatment'
]
==
1
)
.
sum
(
)
>
0
:
treated_mean
=
stratum_data
[
stratum_data
[
'treatment'
]
==
1
]
[
'salary'
]
.
mean
(
)
control_mean
=
stratum_data
[
stratum_data
[
'treatment'
]
==
0
]
[
'salary'
]
.
mean
(
)
effect
=
treated_mean
-
control_mean
print
(
f" Age
{
age_q
.
left
:
.0f
}
-
{
age_q
.
right
:
.0f
}: $ { effect : .0f } " )

8. Sensitivity Analysis

print ( f"\n8. Sensitivity Analysis (Hidden Confounder Impact):" )

Vary hidden confounder correlation with outcome

for
hidden_effect
in
[
1000
,
2000
,
5000
,
10000
]
:
adjusted_effect
=
regression_effect
-
hidden_effect
*
0.1
print
(
f" If hidden confounder worth $
{
hidden_effect
}: Effect = $ { adjusted_effect : .0f } " )

9. Summary Table

print ( f"\n" + "=" * 60 ) print ( "CAUSAL INFERENCE SUMMARY" ) print ( "=" * 60 ) print ( f"True Treatment Effect: $ { true_effect : ,.0f } " ) print ( f"\nEstimates:" ) print ( f" Naive (BIASED): $ { naive_effect : ,.0f } " ) print ( f" Regression Adjustment: $ { regression_effect : ,.0f } " ) print ( f" Propensity Score Matching: $ { psm_effect : ,.0f } " ) print ( f" Stratification: $ { stratified_effect : ,.0f } " ) print ( f" Doubly Robust: $ { dr_effect : ,.0f } " ) print ( "=" * 60 )

10. Causal Graph (Text representation)

print

(

f"\n10. Causal Graph (DAG):"

)

print

(

f"""

Age → Treatment ← (Selection Bias)

↓ ↓

└─→ Salary

Interpretation:

- Age is a confounder

- Treatment causally affects Salary

- Age directly affects Salary

- Age affects probability of Treatment

"""

)

Causal Assumptions

Unconfoundedness

No unmeasured confounders

Overlap

Common support on propensity scores

SUTVA

No interference between units

Consistency

Single version of treatment

Treatment Effect Types

ATE

Average Treatment Effect (overall)

ATT

Average Treatment on Treated

CATE

Conditional Average Treatment Effect

HTE

Heterogeneous Treatment Effects

Method Strengths

RCT

Gold standard, controls all confounders

Matching

Balances groups, preserves overlap

Regression

Adjusts for covariates